Performance of Cross Validation in Tree-Based Models

نویسندگان

  • Seoung Bum Kim
  • Xiaoming Huo
  • Kwok-Leung Tsui
چکیده

Cross Validation (CV) is widely used to measure the performance of a classifier. The main purpose of this study is to explore the behavior of CV in tree-based models. We report experimental studies that compare a cross-validated tree classifier with an oracle classifier that is ideally derived on the knowledge of underlying distributions. The main observation of this study indicates that the difference between the testing and training error from a cross-validated tree classifier and an oracle classifier empirically has a linear regression relation. The “slope” and the “R2” of regression models are employed as the performance measures of a cross-validated tree classifier. Moreover, simulation reveals that the performance of a cross-validated tree classifier depends on the geometry, the parameters of the underlying distributions, and sample size. Such observations can explain and justify the behavior of CV in tree-based models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Use of classification tree methods to study the habitat requirements of tench (Tinca tinca) (L., 1758)

Classification trees (J48) were induced to predict the habitat requirements of tench (Tinca tinca). 306 datasets were used for the given fish during 8 years in the river basins in Flanders (Belgium). The input variables consisted of the structural-habitat (width, depth, gradient slope and distance from the source) and physic chemical (pH, dissolved oxygen, water temperature and electric conduct...

متن کامل

Presenting a Model for Predicting Tax Evasion of Guilds Based on Data Mining Technique

In this research, considering the importance of the topic and the gap in previous researches, a model for predicting tax evasion of guilds based on data mining technique is presented. The analyzed data includes the review of 5600 tax files of all trades with tax codes in Qazvin province during the years 2013-2018. The tax file related to guilds is in five tax groups, including the guild group o...

متن کامل

Real-time quality monitoring in debutanizer column with regression tree and ANFIS

A debutanizer column is an integral part of any petroleum refinery. Online composition monitoring of debutanizer column outlet streams is highly desirable in order to maximize the production of liquefied petroleum gas. In this article, data-driven models for debutanizer column are developed for real-time composition monitoring. The dataset used has seven process variables as inputs and the outp...

متن کامل

Identifying Student Behavior for Improving Online Course Performance with Machine Learning

In this study we investigate the correlation between student behavior and performance in online courses. Based on the web logs and syllabus of a course, we extract features that characterize student behavior. Using machine learning algorithms, we build models to predict performance at end of the period. Furthermore, we identify important behavior and behavior combinations in the models. The res...

متن کامل

Statistical process control for validating a classification tree model for predicting mortality - A novel approach towards temporal validation

Prediction models are postulated as useful tools to support tasks such as clinical decision making and benchmarking. In particular, classification tree models have enjoyed much interest in the Biomedical Informatics literature. However, their prospective predictive performance over the course of time has not been investigated. In this paper we suggest and apply statistical process control metho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005